Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper fields for finding prunable jobs #1145

Merged
merged 4 commits into from
Aug 28, 2024

Conversation

ademenev
Copy link
Contributor

Using "scheduled_at" is not correct. Depending on job state, one of "cancelled_at", "discarded_at" or "completed_at" should be used.

Using "scheduled_at" is not correct. Depending on job state, one of
"cancelled_at", "discarded_at" or "completed_at" should be used.
@@ -153,9 +153,10 @@ defmodule Oban.Engines.Basic do
subquery =
queryable
|> select([:id, :queue, :state])
|> where([j], j.state in ~w(completed cancelled discarded))
|> where([j], j.state == "completed" and j.completed_at < ^time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We switched to the scheduled_at timestamp to make use of the compound index and speed up pruning. There are likely to be many more completed jobs than cancelled or discarded, and a completed job has to execute to end up in that state. So, let's use scheduled_at for the completed state to make use of that index.

Suggested change
|> where([j], j.state == "completed" and j.completed_at < ^time)
|> where([j], j.state == "completed" and j.scheduled_at < ^time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes perfect sense performance-wise

If you do not do some tricky stuff, completed, cancelled and discarded rows never change. Probably they could all use a single timestamp field instead of 3, with a compound index on state and that field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant? queue column is not nullable

where([j], not is_nil(j.queue))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_nil check is pointless with the current index structure, but it was required to force index usage with the old compound table where queue was before state. The index change was optional (way back when), and some people never chose that option.

We can probably remove it at this point, but it doesn't change the query plan.

lib/oban/engines/lite.ex Outdated Show resolved Hide resolved
.vscode/settings.json Outdated Show resolved Hide resolved
@sorentwo sorentwo merged commit 861dfa5 into oban-bg:main Aug 28, 2024
@sorentwo
Copy link
Member

Thanks for submitting this improvement!

@sorentwo sorentwo added kind:bug Something isn't working area:oss Related to Oban OSS labels Aug 28, 2024
@ademenev
Copy link
Contributor Author

Would you consider this?

If you do not do some tricky stuff, completed, cancelled and discarded rows never change. Probably they could all use a single timestamp field instead of 3, with a compound index on state and that field

May be a bit of work and require a heavy migration though

@ademenev ademenev deleted the fix_prining_timestamps branch August 28, 2024 18:46
@sorentwo
Copy link
Member

Would you consider this?

In theory, yes. It would require a few migrations for zero downtime and coordinated changes between packages. That's a lot and I'm not convinced it's worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:oss Related to Oban OSS kind:bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants